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Information Criteria for 
Mu itivariate CARMA Processes 

Vicky Fasen ^ Sebastian Kimmig * * 


Multivariate continuous-time ARMA(/7,^) (MCARMA(p,g)) processes are the 
continuous-time analog of the well-known vector ARMA(/7,^) processes. They have at¬ 
tracted interest over the last years. Methods to estimate the parameters of an MCARMA 
process require an identifiable parametrization such as the Echelon form with a fixed 
Kronecker index, which is in fhe one-dimensional case fhe degree p of fhe auforegressive 
polynomial. Thus, fhe Kr'onecker index has fo be known in advance before fhe paramefer 
esfimafion is done. When fhis is nol fhe case informalion criteria can be used fo esfimafe 
fhe Kronecker index and fhe degrees {p,q), respecfively. In fhis paper we invesfigafe 
informalion criteria for MCARMA processes based on quasi maximum likelihood esli- 
malion. Therefore, we firsl derive fhe asympfofic properties of quasi maximum likelihood 
eslimalors for MCARMA processes in a misspecified pai'ameler space. Then, we presenl 
necessary and sufficienl condilions for information criteria fo be slrongly and weakly con- 
sistenl, respecfively. In parlicular, we sludy fhe well-known Akaike Informalion Crilerion 
(AIC) and fhe Bayesian Information Criterion (BIC) as special cases. 
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1 Introduction 

In fhis paper we sludy necessary and sufficienl condilions for weak and slrong consisfency of in¬ 
formation criteria for multivariate continuous-lime ARMA(p,g) (MCARMA(/7,^)) processes. One¬ 
dimensional Gaussian CARMA processes were already invesligated by Doob ifTSl in 1944 and Fevy- 
driven CARMA processes were propagaled al fhe beginning of fhis cenfury by Peter Brockwell, see Q 
for an overview. An M^-valued Fevy process {L{t))t>o is a stochastic process in with independent 
and stationary increments, L(0) = 0^ P-a.s. and cadlag (continue a droite, limite a gauche) sample 

^Institute of Stochastics, EnglerstraBe 2, D-76131 Karlsruhe, Germany. Email: vicky.fasen@kit.edu 
^Financial support hy the Deutsche Forschungsgemeinschaft through the research grant FA 809/2-2. 

*Institute of Stochastics, FnglerstraBe 2, D-76131 Karlsruhe, Germany. Email: sebastian.kimmig@kit.edu 


1 



paths. Special cases of Levy processes are Brownian motions and (compound) Poisson processes. 
Further information on Levy processes can be found in ||2l|5]|26l, for example. A formal definition 
of an MCARMA process was given recently in |[23l; see Section 0 of this paper. The idea behind it 
is that for a two-sided M'^-valued Levy process L = (L(t))fg]K, i.e. L{t) = L(t)lp>o} — L(t—)l{,<o} 
where {L{t))f>o is an independent copy of the Levy process (L(t))f>o, and positive integers p > q, a 
r/-dimensional MCARMA(p,g) process is the solution to the stochastic differential equation 

P{D)Y{t) = Q{D)DL{t) forteM, (1.1) 

where D is the differential operator, 

P{z) '■= IdxdZ^+A\z^ ^ + ...+Ap_iz+Ap (1-2) 

with Ai ,... ,Ap G is the autoregressive polynomial and 

Q{z):=Boz:^ + Biz^J-^ + ...+B,^iZ + B, (1.3) 

with Bo,... ,Bg G is the moving average polynomial. There are a few papers studying the sta¬ 
tistical inference of MCARMA processes, e.g. ifTTl [T4l [T5l [TTl ITTl 1^ . In particular, ESl derive 
the asymptotic behavior of the quasi maximum likelihood estimator (QMLE) under the assumption 
that the underlying parameter space 0 with N{&) parameters contains the true parameter and satisfies 
some identifiability assumptions; see ifTOl as well. These are typical assumptions for estimation proce¬ 
dures. For a one-dimensional CARMA process we only obtain identifiability when the degree p of the 
autoregressive polynomial is fixed for all processes generated by parameters in the parameter space; 
in the multivariate setup the Kronecker index, which specifies in detail the order of the coefficients 
of the multivariate autoregressive polynomial, has to be fixed. If we know the Kronecker index we 
know the degree p of the autoregressive polynomial as well. But if we observe data, how do we know 
what is the true Kronecker index of the data, so that we do the parameter estimation in a suitable 
parameter space 0? That is the point where we require model selection criteria or, synonymously, 
information criteria. The most prominent model selection criteria are the Akaike Information Crite¬ 
rion (AIC) introduced in HI by Akaike, the Schwarz Information Criterion (SIC), also known as BIC 
(Bayesian Information Criterion), going back to |[29l . and the Hannan-Quinn criterion in |[20l . The 
AIC approximates the Kullback-Leibler discrepancy, whereas the BIC approximates the Bayesian a 
posteriori distribution of the different candidate models. The Hannan-Quinn criterion is based on the 
AIC of Akaike but with a different penalty term to obtain a strongly consistent information criterion. 
Information criteria for multivariate ARMAX processes and their statistical inference are well-studied 
in the monograph lfT9ll ; see also @ for an overview of model selection criteria for ARMA processes. 
An extension of the AIC to multivariate weak ARMA processes is given in 161. There exist only a 
few papers investigating information criteria independent of the underlying model, e.g. |[30l present 
very general likelihood-based information criteria and their properties, and ifT^ derive the BIC. All 
of these information criteria have in common that they are likelihood-based and choose as candidate 
model the model for which the information criterion attains the lowest value. They are of the form 

IC„(0) ■.= ^{d\Y")+N{&)^^. 

n 

In our setup T" = (T(/i),... ,Y{hn)) is a sample of length n from an MCARMA process, A/f is the 
properly normalized quasi log-likelihood function, r?” is the QMLE and C{n) is a penalty term. We 
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choose the parameter space as the most suitable for which the information criterion is lowest, this 
means that for two parameter spaces 0 i ,02 we say that 0i fits the data better than 02 if we have 
ICn{@i) < /C„(02). A Strongly consistent information criterion chooses the correct space asymptoti¬ 
cally with probability 1, and for a weakly consistent information criterion the convergence to the true 
space holds in probability. The sequence C{n) can be interpreted as a penalty term for the inclusion 
of more parameters into the model. Without the penalty term, the criterion would always choose the 
model with more parameters if we compare two parameter spaces both containing a parameter that 
generates the data. However, this is not feasible, since the inclusion of too many parameters ultimately 
leads to an interpolation of the data, such that the model would not provide information about the pro¬ 
cess generating the data anymore. The employment of an information criterion can therefore be seen 
as seeking a trade-off between accuracy and complexity. 

The rest of the paper is structured in the following way. In Section 0 we present basic facts on 
MCARMA processes and state space models. Since ou r info rmation criteria are based on quasi 
maximum likelihood estimation we defin e first, in Section l3.ll. the quasi log-likel ihoo d function for 
MCARMA processes and in Section l3.2l the model assumptions. Then, in Section 13.31 we derive the 
asymptotic normality of the QMLE extending the results given in ESi to a misspecified parameter 
space. For the proof of strong consistency of the information criteria we require some knowledge 
about the asym ptoti c behavior of the quasi log-likelihood function .if as well. For this reason we 
prove in Section [ t 3 a law of the iterated logarithm for the quasi log-likelihood function .if. Section^ 
contains the main results of the paper: necessary and sufficient conditions for strong and weak con¬ 
sistency of information criteria. In particular, we investigate Gaussian MCARMA processes where 
the results are explicit. Special information criteria are the AIC and the BIC which are the topic of 
Section 0. Finally, we conclude with a simulation study in Section 0. The Appendix contains some 
auxiliary results. 


Notation 

& p 

We use the notation for weak convergence and —)• for convergence in probability. For two random 
vectors Z\ , Z 2 the notation Z\ = Z 2 means equality in distribution. We use as norms the Euclidean 
norm H-H in and the spectral norm H-H for matrices, which is submultiplicative and induced by the 
Euclidean norm. Recall that two norms on a finite-dimensional linear space are always equivalent and 
hence, our results remain true if we replace the Euclidean norm by any other norm. The matrix 
is the zero matrix in and Idxd is the identity matrix in For a vector v G we write 

for its transpose. For a matrix A G we denote by tr(A) its trace, by det(A) its determinant and 
by Aniax(A) its largest eigenvalue. If A is symmetric and positive semidefinite we write A 2 for the 
principal square root, i. e. A 2 is the symmetric, positive semidefinite matrix satisfying A 2 A 2 = A. 
For two matrices A G and B G we denote by A(8)B the Kronecker product, which is an 

element of The notation vec(A) describes the ds xl row vector which results from stacking 

the columns of A beneath each other. The symbols E, Var, and Cov stand for the expectation, variance 
and covariance operators, respectively. For a sequence of random variables we say that 

is Oa.s.(an) if |Z„/a„| —)• 0 as n —)• 00 P-a.s. and likewise that X„ is Oa.s.i^n) if fiinsup,,^,,^ \Xr,/a„\ < °° 
P-a.s. We write dt for the partial derivative operator with respect to the /-th coordinate and V = 
(5i,..., d,.) for the gradient operator in R''. Finally, by d^j we denote the second partial derivative 
with respect to the coordinates i and j, and by V^/ we denote the Hessian matrix of the function /. 
When there is no ambiguity, we use 5,/(t5o), V^/(r5o) and V^/(r5o) as shorthands for dif{'d)\^=^, 

respectively. We intei-pret V^/(r?) as a column vector. In general 
C denotes a constant which may change from line to line. 
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2 MCARMA processes and state space processes 


We stall with the formal definition of an MCARMA process, which can be interpreted as solution of 

Ol- 


Definition 2.1. Let (L(t))(gR be an W-valued Levy process with 


mials P{z),Q{z) be defined as in 

(11.21) and (11.31) with p,q E No, 

define 


X 
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Idxd 

X 

O 

X 
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Idxd 
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Orfxd Idxd 



V -Ap 

—Ap-\ 

. -Ai / 

C — ijdxdi^dxdi ■ 

..,0dxd)&^‘‘^P‘‘ andB 

= {pj ■■■PlY (2 


E||L(l)|p < oo and let the polyno- 
q < p, and Bq / Odxs- Moreover, 




^ with 


p-i-i 


pi ■ —— Pp-q—i-—^dxs and pp—j .— Aipp—j—i-\-Bq—j, j — 0,...,q. 


i=l 


Assume that the eigenvalues of A have strictly negative real parts. Then the -valued causal 
MCARMA(p,g) process 7 = (7(t))/GM is defined by the state space equation 


Y{t)=CX{t) /orteR, 


(2.1) 


where X is the stationary unique solution to the pd-dimensional stochastic differential equation 

dX{t)=AX{t)dt + BdL{t). (2.2) 

In particular, the MCARMA(1,0) process and X in (12.21) are multivariate Ornstein-Uhlenbeck pro¬ 
cesses. For more details on the well-definedness of the MCARMA(/7,^) process see ll2^ . The class 
of MCARMA processes is huge. Schlemm and Stelzer Il28l Corollary 3.4] showed that the class of 
continuous-time state space models of the form 

Y(t)=CX(t} and dX (t) = AX (t) dt-h B dL(t), (2.3) 

where A G R^^^ has only eigenvalues with strictly negative real parts, B G R^^** and C G and 

the class of causal MCARMA processes are equivalent if E||L(l)|p < oo and E[L(1)] = 0^. In general, 
when we talk about an MCARMA process or a state space model Y, respectively, corresponding 
to {A,B,C,L), we mean that the MCARMA process Y is defined as in (12.31) and shortly write Y = 
MCARMA(A,B,C,L). 

In this paper we observe the MCARMA process only on a discrete equidistant time-grid with grid 
distance h > 0. It is well-known that the Ornstein-Uhlenbeck process (X(t))t^m sampled at hZ is an 
AR( 1 )-process with 


X{kh)=e'^^X({k-\)h)+Nh^k, keZ, 
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where = ' ^BdL(r) is a sequence of i.i.d. random vectors. We denote its covariance 

matrix by Cov{Nh^k) = Zh- Hence, {Y{kh))ic^x is the output process of the discrete-time state space 
model 

Y{kh) = CX{kh) where X{kh) = e.^^X{{k - l)/i) +Nh,k- (2.4) 

This discrete-time state space representation is basic for quasi maximum likelihood estimation. 

3 Quasi maximum likelihood estimation 

3.1 Definition 

Since the MCARMA process observed at discrete equidistant time points is a discrete-time state space 
model as given in (I2.4I) . we use quasi maximum likelihood estimation for discrete-time state space 
models with respect to identification issues. We now review the most important aspects of estimation 
as it is done in ESll for MCARMA processes. The estimation is based on the Kalman filter, which 
calculates the linear innovations of a Gaussian discrete-time state space model; originally introduced 
in and described in a time series context in lH §12.2]. 

Definition 3.1. Let {Zk)kez be an -valued stationary stochastic process with finite second mo¬ 
ments. The linear innovations e = {Sk)keZ then defined as £k = Zk — Pk-iZk, where Pk denotes the 
orthogonal projection onto the space spanjZj : —oo < j <k} and the closure is taken in L^. 

Note that this definition ensures that the innovations of such a process are stationary, uncorrelated and 
have mean 0. In the following we calculate the linear innovations of (T {kh))kei,- 
For this purpose, let Q. be the solution to the discrete-time Riccati equation 

Q. = e'^^Q.e^''^'+Zh - (e^’'Q.C^y, 

which exists by |[2^ Proposition 2.1i)]. Then, the Kalman gain matrix is 

K= (ef^'^QCy {CD.C'^yZ 

The linear innovations of (T {kh))k^z can be calculated as 

CO 

ek = Yk-CXk, keZ, with Xk='£ie^^^-KCy-^KY{{k-j)h). (3.1) 

y=i 

The covariance matrix of the innovations is V := K[£k£l] = CQ.C^. If we observe data we unfor¬ 
tunately do not know the model parameter behind it and hence, we have to calculate the so-called 
pseudo-innovations. In the following we assume that our data set is generated by a continuous-time 
state space model {A,B,C,L), i.e. Y = MCARMA(A,S,C,L). Moreover, we have have a parametric 
family of MCARMA models with r? in the parameter space 0 C A(0) G N. 

The aim is to find Bq £ & such fhaf MCARMA(A^,R^,C^,L^) = Y. Therfore, we calculafe for 
every r? G 0 fhe steady-stafe Kalman gain mafrix and covariance mafrix via fhe discrefe-fime 
Riccafi equafion 
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as 

and 

Based on this the pseudo-innovations are defined as 

CO 

e^,k = Yi{k-l)h)-C^X^^k, k€Z, with x^,i= 

i=i 


Note that {X^^k)kei can also be calculated recursively by 

Xi^.k = {e^^^-Ki}C^})X^,k-i +Ki}Y{{k-l)h). 


For r9o such that MCARMA(A,5n^B^,Crio,F^) = Y the pseudo-innovations {s^^k)keN are the in¬ 
novations as given in Definition l3.ll and = E[£^ J. With this, —2/n times the Gaussian 
log-likelihood of the model associated to d- is 


if(t?,F”) 


1 

n 


n 


Y,{dlog{2n)+log{det{V^)) + eli,V^ 

t=i 


(3.2) 


Defining 

li^^k :=dlog{2n) + log{det{Vi})) + elf,V^^ei}^k, keZ, 
we can also write = \ YJk=\ The expectation of this random variable is 


,S(t?) :=E[j^'(r?,F")]. 


In practical scenarios it is not possible to calculate the pseudo-innovations, as they are defined in terms 
of the full history of the process Y but we have only finitely many observations. Suppose now that we 
have n observations of the output process Y, contained in the sample 
Y" = (F(/i),... ,Y{nh)). Therefore we need a method to approximate the pseudo-innovations based on 
this finite sample. We initialize the filter at k = 1 by prescribing i =A^ initial and use the recursion 

X^.k = +K^}Y{{k-\)h), k>2, 

e^,k = Y{{k-l)h)-Ci}Xi},k, ken. 

The e^^k are denoted as approximate pseudo-innovations. Substituting the approximate pseudo¬ 
innovations for their theoretical counterparts in (13.21) . we obtain the quasi log-likelihood function 
as 

:= £(r/log(27r) +log(det(F, 5 )) + el,V^^e^,k)- (3.3) 

The QMLE based on the sample F" is then given by 

t?" :=argmin.F(t?,F"). (3.4) 

I?60 

The idea is that r?” is an estimator for the pseudo-true parameter 

t?* := argmin.S(r?). (3.5) 

i?e0 
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The function ^ attains its minimum at ■&* in the space 0. However, if we minimize only over 0 and 
0 does not contain a parameter generating Y then it is not clear that the minimum, and hence •&*, is 
uniquely defined. On the other hand, if there is a t?o G 0 with MCARMA(Ado,B^,C^,L^) = Y then 
•&* = •&Q. The last case was investigated in 

3.2 Assumptions 

In this section we give the model assumptions which we require for the asymptotic results on the 
QMLE rJ”. The next definition introduces the concept of minimal algebraic realizations of matrix 
polynomials, which is essential in describing identifiable paramefrizafions of MCARMA processes. 

Definition 3.2. Let H be a d x s rational matrix function, i. e. a d x s matrix whose entries are 
rational functions of the variable z G M. 

(a) A matrix triple {A,B,C) is called an algebraic realization of H of dimension N if H{z) = 

C{zInxN — A)^^B for every z G M, where A G B G and C G . 

(b) A minimal realization ofH is an algebraic realization ofH of dimension smaller or equal to the 
dimension of every other algebraic realization ofH. The dimension of a minimal realization of 
H is the McMillan degree ofH. 

We now presenf fhe assumptions we use in fhe developmenf of fhe asymptotic fheory of fhe QMLE: 

Assumption B. 

B.l The parameter space 0 is a compact subset ofM^^®\ 

B.2 For each B £ &, it holds that E[L^] = 0, E||L^(l)|p < oo and the covariance matrix 
= E [L^(1)L^(1)] is non-singular. 

B.3 For each B £ &, the eigenvalues of A^ have strictly negative real parts and are elements of 

{zGC:-f </m(z)<f}. 

B.4 The pseudo-true parameter B* as defined in (13.51) is an element of the interior of&. 

B.5 For the Levy process L which drives the observed process Y there exists a positive number 5 
such that'E\\L{1)\\‘^^^ < oo. 

B.6 For every e > 0 there exists a 5(£) > 0 such that 

■SiB*) < min .SiBi — Sie), 

where Be{B*) is the open ball with center B* and radius e. 

B. 7 The Fisher information matrix of the QMLE is non-singular. 

B.8 The functions t? i—)• A^, r? i—?■ B^, B and t? i—)■ are three times continuously differen¬ 

tiable. Moreover, for each B £@, the matrix has full rank. 

B.9 For all B £&, the triple {A^,B^,C^) is minimal with McMillan degree N. 

B.IO The family of output processes {MCARMA{A^,B^,C-^,L^))^(zq is identifiable from the spectral 
density. 
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Remark 3.3. 

(a) Every process in the family has a different spectral density Z^v Ifi.lOl More¬ 

over, bv \B.9\ it is also ensured for two parameter spaces 0 and & both satisfying Assumption B with 
different McMillan degrees that the processes generated by parameters in 0 are different from the 
processes generated by parameters in 0'. 

(b) Assumption 15.61 is a property called identifiable uniqueness. It makes sure that B* is the unique 
minimum of {B) in 0 (cf 07] p. 28]). In the correctly specified case, i. e. when the space 0 
contains Bq with MCARMA{A^,B^,C^,L^) = Y, the identifiable uniqueness follows from some 
properties satisfied by the innovations associated to the true parameter Bq, i. e. Assumption \B.6\ can 
then be dropped without any replacement. 

(c) In case of a correctly specified parameter space, we can replace Assumption \B.l\ bv the assump¬ 
tion that there exists a positive index /q such that the [(/q + 2) x r matrix 


( 


\ 


7(,-o+l)x((o+l)®^lo®^’5o 


vecViJo 


/vecexp(/A?xw/i)\ \ 
vec exp(A^g/i) 


\ vec exp (A'^/i) ) 




has rank r. This condition is used in im as Assumption CIl and guarantees the desired non¬ 
singularity. 

Remark 3.4. An MCARMA process (A,B,C,L) in Echelon form with Kronecker index 
m = {nil,... ,md) has the property that A = (A,y),- y=i^ G is a block matrix with blocks 

Aij G given by 


o\ 


Aij = 




+ Sij 



0 


.. 0 


0 




.. o) 


^0 ... 0 / 


/O 


and 




C = 


0 


{d—l)xm[ 


\ 


0 


0 

0 

{d—2) xm2 


0 

0 


0 


(t/—i)xmj 


1 0 


0/ 


The matrix B = (bij) G is unrestricted. Moreover, the polynomials P{z) = [Pij{z)] and Q(z) = 
[qij{z)] are of the form 


p,j{z) = - 


I 

k=l 


^ij,kZ 


k-\ 


and qijiz) — ^ tCyi+...+Vi-\+k,jZ 

k=i 


.k-l 


where Kjj is the {i,j)th entry of the matrix K = TB, where T = .^ G is a block matrix 






















with blocks Tij G given by 





/O 0 
0 0 


0 1 \ 
1 0 


*^i7,min(m,+l{,>y} ,njj) 



0 


0 1 


0 0 

0 0 / 


V 


0 


0 / 


Vi 0 


This means that the Kronecker index specifies the degrees of the polynomials on the diagonal of the 
autoregressive polynomial P{z); the polynomials on the secondary line have a degree of at most 
min(ni,-+ In particular, we can calculate the degree p = max,=i.. of the autore¬ 

gressive polynomial. Moreover, the polynomials P and Q can be calculated explicitly from A,B and 
C. Important is that an MCARMA process in Echelon form fulfills the smoothness and identifiability 
assumptions 1^.81 \B .9\ and \B .\0\ A special subclass of MCARMA processes in Echelon form are the 
one-dimensional CARMA processes, for which the degree p of the autoregressive polynomial is fixed 
and the zeros of P and Q are distinct. This class corresponds to the class of CARMA processes in 
Echelon form with Kronecker index p. Eor more details on MCARMA processes in Echelon form we 
refer to /l^ Section 4.1 ]. 

3.3 Asymptotic normality 

The next proposition collects auxiliary results which are used in the proof of the asymptotic normality 
of the QMLE. They are highlighted here separately for easier reference, because they will appear 
again later in a different context. 

Proposition 3.5. 

(a) Assume that the space 0 with associated family of continuous-time state space models 
{A^,B^,C^,L^)^^q satisfies Assumptions 15.1 1 to 15.31 as well as \B5\ Then, there exists a 
pseudo-true parameter B* as defined in Equation (13.51) and for every n G N, there exists 


t?; = argminE 


(3.6) 


I?60 


as well. If 0 also satisfies the other parts of Assumption B then B* —)• B* as n —)• oo. In 


particular, for n sufficiently large B* is in the interior of® as well. 

(b) Assume that the space 0 with associated family of continuous-time state space models 
{A^,B^,C^,L^)^^q satisfies Assumptions \B.\\ to \B.9\ Then the strong law of large numbers 


if(t?,T”)^^(t?) P-a.s. 


holds uniformly in B as n ^ oo, 

(c) Assume that the space 0 with associated family of continuous-tune state space models 
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satisfies Assumption B 


Then, as n —)■ oo, 

4 -^( 0 , 


where = lim„^ooaVar(V^Jf(t?*,F")). 

(d) Assume that the space 0 with associated family of continuous-time state space models 
{A^,B^,C^,L^)^^q satisfies AssumDtions \B A\ to \B.9\ Then the convergence 

P-a.s. 


holds uniformly in B as n^ oo, where := E [V^Z^ i 


(e) Assume that the space 0 with associated family of continuous-time state space models 

Then there exist S,a > 0 such that for almost 


satisfies \Assumption B 


all (O and for every n> ni (ft)) and B £ Be(B*) Pi 0 we have 


det(^Vl^(B,r”)(o})^ > a. 


Proof (a) The existence state men ts follow directly from ll^ Proposition 3.1]. The convergence 
B* B* follows from Lemma|^d). 

(b) This is exactly ESi Lemma 2.8] taking ESl Lemma 3.14] into account. 

(c) Note that under Assumption B we have = V^E[j2f(r?,T")] = 0. Next, we 

use the dominated convergence theorem to interchange the expectation and derivation, giving 


= (3.7) 

This rest of the proof can be carried out as Lemma 2.16]. 

(d) The pointwise convergence can be proved as in ESl Lemma 2.17], respectively 171 Lemma 2 and 
Lemma 3] taking E8l Lemma 3.14] into account. The stronger statement of uniform convergence can 
be shown by using the compactness of the parameter space analogous to the proof of E^ Lemma 
2.16], respectively llT^ Theorem 16]. 

(e) Assumption IB.7I says that the Fisher information matrix E [V^Z^* i] is invertible and hence, 
det(E [V^Zij* i]) > 0. Moreover, by Assumption IB . 8 1 the map r? i—)• E [V^Zijj] is continuous. Thus, 
there exist £, a > 0 such that inf^g 5 ^(^*)p| 0 det(E [V^^j] ) > a. Since by (d) as n —^ oo, 

sup ||V|.F(r?,T")-E[v|Z^,i] II ^0 P-a.s., 

I?6Be{j>*)n© 


we finally get lim„^ocinf^gg^(^*)f-| 0 det(V^.if(r?,T”)) > a P-a.s. □ 

We can now state the desired central limit theorem, which basically combines l30l Proposition 4.1] 
and E8l Theorem 3.16]. 


Theorem 3.6. Assume that the space 0 with associated family of continuous-time state space models 
satisfies Assumption B\ Then, as n^ oo^ 


P-a. 5 ., 
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and 

where 

= lim?iVar(V^Jf(t?*,7”)) and = lim (3.8) 

n—>oo «—>oo 

Proof. The proof can be carried out in the same way as |[^ Theorem 3.16, Theorem 2.4 and Theorem 
2.5], respectively, replacing r9o by '&* wherever it appears. Note that we have the additional assump¬ 
tion concerning identifiable uniqueness, which ensures that the estimator converges to a unique 
limit, see also lOTl Theorem 3.4]. □ 

Remark 3.7. 

(a) For the strong consistency part of the theorem, Assumption lfi.3l can be relaxed requiring only 
continuity instead of three times differentiability. 

(b) In the case that we are in a correctly specified parameter space, this theorem corresponds exactly 
to h28\ Theorem 3.16]. 


3.4 Law of the iterated logarithm 

This section is devoted to the development of various forms of the law of the iterated logarithm which 
we need to study the consistency properties of the information criteria. In the following proposition 
we start by establishing a law of the iterated logarithm for linear combinations of partial derivatives 
of the quasi log-likelihood function. 


Proposition 3.8. Assume that the space 0 with associated family of continuous-time state space mod¬ 
els satisfies Assumption B Then, for every x € \ {Oyv( 0 )} it holds that 


limsup— — 4^ = J1 ■ x^ JT {B*)x P-a. 5 ., 

, 1 ^- y^log(log(n)) v 

liminf = - J 2 ■ x^ JT('d*)x P-a. 5 . 

Y^log(log(n)) V 

Proof. Let rc S \ { 0 ^( 0 )}. First, it can be deduced that x^ J^{'&*)x is finite and positive from 
Il2^ Lemma 2.16]. Moreover, by ESl Eq. (2.24)] the representation 


dil'&*,k — tr ifdxd T 


(3.9) 


holds. By Lemma lA.il we know that both the pseudo-innovations and their partial derivatives can be 
expressed as moving averages of the true output process via 


CO OQ 

£it\k= Y,c^\vY{{k-v)h), die^*^k= Y.c^^lvY{ik-v)h) 
v=0 v=0 


(3.10) 


and the inequalities sup^^Q ||ci>,v|| < Cp'' and sup^g 0 || < Cp^ are satisfied for some 

C > 0 and p G (0,1) for i G {1,... ,A(0)}. Thus, x^V^l^t^k = can be written as 

f{Y{kh),Y ((k — l)/r),...) for a suifable funcfion /. 

The aim is now fo apply fhe law of fhe iterafed logarifhm for dependenf random variables as it’s given 
in ll24l Theorem 8 ], for which we need to check the following three conditions: 
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(a) E[x^V^l^*^k] 


(b) E 


\x 


= 0{m-^-^-) 


= 0 and E < oo for some 5i > 0 . 

-E [x'^'Vi}h*,k I O {Y{{k - m)h),... ,Y(kh),... ,Y{{k + m)h))] 
for some ^2 > 0 and m £ N. 


^3 

(c) aY(h){k)^ < oo for some 0 < ^3 < 5i, where {aY{h){k))k^z denotes the strong mixing 

coefficients of the process (F {kh))kizi. 


(a) We start with the first condition. For the first part it follows as in (13.71) that E [dil^*^k] = 0 for 
every /£{!,... ,A^(0)}, hence E [x'^V^I^*^k\ = 0. For the second part, for any /£{!,... ,77(0)} we 
employ (13.91 ) and the Cauchy-Schwarz inequality to obtain 




< CE |tr +CE 

< C (^E||£^.,,f+25' + (E||£^*,,f+25l]E||5,£^.,,f+25l^ 2^ ^ 


where we have used the the compactness of 0 in the last line. From Assumption |g.5l we know that 
the driving Levy process L of F has finite (4 + 5)th moment for some 5 > 0, which carries over to the 
(4 + 5)th moment of F {kh), k £ Z, and hence to e^*^k and die^*^k- With this, we obtain that the right- 
hand side is finite if 5i < 2 - Since /£{!,... ,A(0)} is arbitrary and x^V^l^*^k is a linear combination 

I T |2+5l 

of those components, we get E \x < 00 . 

(b) For the second condition, we begin by decomposing the partial derivative as in the proof of ESi 
Lemma 2.16]. For m £ N we write 


dil-&*,k — Y^^j^ E 


y{i)' 

^m,k 


+ Z^'\-E 

' m,k 




where 




+ £ {-iv{V^}ci^>^yY{{k-v)h)Yl{{k-v')h)cl,yV^}diV^,) 
v,v'=0 

+ lYl {(k - ((t - v'jft)) , 


Hence, we obtain 


E 


- E [x^v^h*±\(y {Y{{k - m)h),... ,Y{kh ),... ,F((k + m)h))] |' 


<E 


V(0) 


r=l 


A(0) 


y v-z * 


i=l 


N{0) N{&) 

= £ ;c?Var(Z«,)+2 £ .v,-.v,Co»(Z,« ,Z« 

/=! ij=l 

¥j 


From step 2 of the proof of ESl Lemma 2.16] we know that Cov(Z^‘^^,Z^2^ < Cp'” for a positive 
constant C and p £ (0,1), and every /,} £ {1,... ,A(0)}. Thus, the second condition is satisfied as 
well. 

(c) Lasfly, we furn fo fhe fhird condifion. By ll2^ Proposifion 3.34] fhe sfrong mixing coefficienfs 
ay(t) of (F(t))^giR are C?(e^'") for some a > 0 , which carries over fo fhose of fhe sampled process 
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(y {kh))kez- Thus, we can choose ^3 < 5i < -^ to obtain OCym (k)^^ < 00 as desired. 
Then a consequence of (a)-(c) and |[2^ Theorem 8 ] is the law of the iterated logarithm 


lim sup — , :_ 

J^(i}*)xlog(log(nx'^J^(i}*)x)) 


= 1 P-a.s. 


Since log(log(n.r^ J^(r?*).r)) = 0(log(log(n))) we can therefore deduce the s tatement by symmetry 
(the driving^Levy process has expectation 0^) for .if. Finally, by Lemma lA^l fbl we can transfer the 
result to Jf as well. □ 

The next theorem builds upon this to derive a multivariate version of the law of the iterated loga¬ 
rithm. 

Theorem 3.9. Assume that the space 0 with associated family of continuous-time state space models 
satisfies Assumption B Moreover, let S G arbitrary matrix. 

Then it holds that 

lim sup ^ - 

y^log(log(?l)) 




Proof First, since jT{'d*) = lim„^c»nVar (V^Jf(r?*,T")) (cf. Proposition b.Sl fcll. it holds that 

lim «Var(EV^.if(r?*,F")) = 

n—foo 

An application of Proposition l3.8l gives 


lim sup — -j^=^= _ 

n^oc Y/log(log(n)) 


x^ SV^Jf(r?*,F") = \ l-x^ZJ{&*)ZTx P-a.s. 


for every x G \ {0^(0)}. Just as in the proof of ifTSl Lemma 2], we can conclude from this that 
P-a.s. 


s/n 


lim sup - _ 

n^oo Y/log(log(n)) 


|SV^.if(rJ*,F”)|| = lim sup 


s/n 


_ r - : sup 

^f\og{\og{n)) ||q|=i 




= sup J2-x^ZjT[B*)Z'^x 
IWhi 


□ 


Having this theorem allows us to derive a variant of the law of the iterated logarithm for the function 

Theorem 3.10. Assume that the space 0 with associated family of continuous-time state space models 
satisfies Assumption B Then 


lim sup -—-—— 
n^oo log(log(n)) 
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Proof. A first-order Taylor expansion of (r?”,T") around ■&* gives 

0 = - r?*), 

for some r?" with ||r?" — r?*|| < ||r?” — r?*||. Since by Theorem b.hl we know that r?” —)• rJ* P-a.s., 
r?” —)• r?* P-a.s. as well. A conclusion of Proposition [s^e) is that lim„^oodet(V^Jf(r?",T")) > 0 
P-a.s., so that 

= P-a.s. (3.11) 

is well-defined. Now we employ a Taylor expansion again, albeit this time we expand .j5f(r?*,T") 
around •&’' and use a second-order expansion. This gives us 


.F(r ,T") = + 2(^” - - r), 


for some t?” with ||t?” — < ||^?” — i?*!!, where we have used V^.if(r?”,T”) = 0. As above we 

have t?” ■&* P-a.s. Rearranging the terms, we arrive at 






An application of Theorem l3.9l with E = 2 (which is symmetric) yields 


(3.12) 


limsup^ ^ ||jr(r?*)-iV^.F(r?*,T”)|| = J2- P-a.s. 
n^co y^log(log(n)) 

With V|..F(t?",T")5 V^..F(^",T")-' P-a.s. (cf. Proposition Eld)) and d3T2]) we can 

derive the statement. □ 


Remark 3.11. This result is an analog to 0^ Proposition 5.1] which investigates consistency of 
information criteria under some different model assumptions. However, it is stronger than the one in 
the cited article, since we are able to specify the limit superior exactly while in m it is only shown 
that convergence occurs. 


4 Likelihood-based information criteria 

In this main section we derive properties for likelihood-based information criteria of the following 
form. 


Definition 4.1. Assume that the space 0 with associated family of continuous-time state space models 


(A^,B^,Ci>,L^)ijg 0 satisfies Assumption B Furthermore, let be the QMLE based on Y" in 0 as 
defined in (13.41) and letC{n) be a positive, nondecreasing function ofn with 

,■ C{n) 

lim = 0. 

n^oo n 
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Then a likelihood-based information criterion has the form 


/C„(0) :=^(t?",7")+A^(0) 


C{n) 


(4.1) 


These information criteria have the property that IC„(0) —)■ ^(^ ). Since ^ attains its minimum at 
r?o for which MCARMA(A^,B^,C^,L^) = Y (cf. Lemma lA.3h we choose the parameter space for 
which the information criterion is minimal. The condition C(n)/n —)• 0 guarantees that underfrtting is 
not possible, i. e. there is no positive probability of choosing a parameter space which cannot generate 
the process underlying the data. However, C(n)/?i —)• 0 is not sufficient to exclude overfitting, i.e. a 
positive probability to choose a space with more parameters than necessary. In the following we will 
give necessary and sufficient conditions to exclude this case. To this end we need some notation. 

Definition 4.2. Let 0 and &o be parameter spaces with associated families of continuous-time state 


space models and (A^,B^,C^,L^)^g©, respectively, satisfying Assumption B 


Assume that there is a r5-o G 0o with MCARMA{A^,B^,C^,L^f) = Y. We say that 0o is nested in 
0 //’A(0o) < N{&) and there exist a matrix F G pTp _ as well as a 

c G such that 


(Aj 5 , RjJ , CjJ , Ljj) iJg0Q - (A77^_)-c,fif ^Fl>+c)75g©0 • 


The interpretation of nested is that all processes generated by a parameter in 0o can also be gener¬ 
ated by a parameter in 0. However, there are also processes which can be generated by a parameter in 
0, but not by a parameter in 0o. In this sense 0o is contained in 0. The condition F^F = In{&o)xN{&o) 
guarantees that we have a bijective map from 0o —)■ ^00 + c C 0. 

For MCARMA processes parametrized in Echelon form, a parameter space 0 that satisfies 


Assumption B confains only processes fhat have fhe same Kronecker index m = {nii,... pi^) and 
hence, fixed degree p = maxi^i ^mi of fhe AR polynomial. However, for fhe MA polynomial we 
only know fhaf fhe degree is less fhan or equal fo p — 1. In fhis confexf 0o could be a paramefer space 
generafing processes wifh Kr'onecker index mo and MA degree nof exceeding qo, where 0 generafes 
processes wifh Kronecker index mo and MA degree not exceeding q, qo < q < Po — Then 0o is 
nested in 0. In this way our information criteria can be used to estimate the Kronecker index, the 
degree of the AR polynomial and the degree of the MA polynomial. 

In the following we investigate only parameter spaces with associated family of continuous-time 
state space models {A^,B^,C^,L^) in Echelon form. Eet the Kronecker index, the degree of the AR 
polynomial and the degree of the MA polynomial, respectively, belonging to Y be denoted by mo, po 
and qo, respectively. Then 0 q denotes the parameter space generating all MCARMA processes with 
Kronecker index mo. The degree of the AR polynomial of those processes is then po, the degree of 
the MA polynomial is between 0 and po ~ 1 ■ The space 0q is the biggest parameter space generating 
MCARMA processes in Echelon form, satisfying Assumption B and containing a parameter Bq with 


MCARMA(A^£,R^£,C^£,L^£) = Y. Note that Bq is then the pseudo-true parameter in 0 q. 

Next, we define under which circumsfances IC„ is consisfenf; we distinguish fwo differenf fypes of 
consisfency. 

Definition 4.3. 

(a) The information criterion ICn is called strongly consistent if for any parameter spaces 0o 
and 0 with associated families of continuous-time state space models {A^,B^,C^,L^)^^Qg 


and {A^,B^,C^,L^)^^q, respectively, satisfying Assumption B and with a t?o G 0o such that 
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MCARMA{A^,B^g,C^g,L^^^) = Y, and either MCARMA{A^,B^,C^,L^) / Y for every B £ & 
or 00 being nested in 0 we have 

P Mimsup (/C„(0o) —/C„(0)) <0 j = 1. 

V n—>oo J 


(b) The information criterion IC„ is called weakly consistent if for any parameter spaces 0o and 
0 with associated families of continuous-time state space models (A^,B^,C^,L^)^g 0 Q and 
(A^,B^,C^,L^)^g0, respectively, satisfying \Assumption g| and with a Bo £ &o ^uch that 
MCARMA{A^,B^,C^,L^) = Y, and either MCARMA{A^,B^,C^,L^) / Y for every B £& 
or 00 being nested in 0 we have 

limP(/C„(0o)-/C„(0) <0) = 1. 

«—>00 


If the information criterion is strongly consistent, then the chosen parameter space converges almost 
surely to the true parameter space. For a weakly consistent information criterion we only have conver¬ 
gence in probability. Moreover, if we compare two parameter spaces both containing a parameter that 
generates the true output process, then we choose the parameter space with less parameters asymp¬ 
totically almost surely in the strongly consistent case, whereas in the weakly consistent case we have 
convergence in probability. This especially means overfitting is asymptotically excluded. 

With these notions we characterize consistency of /C„ for MCARMA processes in terms of the 
penalty term C{n). 

Theorem 4.4. 

(a) The criterion IC„ is strongly consistent if 

limsup >Knax{^{Biy'^^{B^)J^{Biy'^). 

log(log(n)J 

The information criterion is not strongly consistent j/limsup„^^C(n)/log(log(n)) = 0. 

(b) The criterion IC„ is weakly consistent j/limsup„^„C(n) = 00 . //'limsup„^„C(n) < 00 then IC„ 
is neither weakly nor strongly consistent. 

(c) Let 0 and 0o be parameter spaces with associated families of continuous-time state space 
models (A^,5^,C^,L^)^g©Q and (A^,B^,C^,L^)^g©, respectively, satisfying 
Assume that there is a Bq £@o with MCARMA(A^,B^,C^,L^) = Y and 0o is nested in 0 
with map F. Moreover, suppose limsup„_^„C(n) = C < 0 °. Define 


Assumption B 


Then 


limP(/C„(0o)-/C„(0) >0) 

n^oo 


^iv(0)-yv(0o) 


£ A,x">2[iV(0)-iV(0o)]C > 0 , 


1=1 


where {%}) is a sequence of independent random variables with one degree of freedom and 
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the Xi are the N{&) — A^(0o) strictly positive eigenvalues of 


Proof For the whole proof, we denote by r?o the parameter in 0o with 
MCARMA(A^,B^,C^,L^) = Y and by t?* the pseudo-true parameter in 0. Moreover, let t?Q 
denote the QMLE based on F" in 0o, t?" the QMLE based on F" in 0 and the QMEE based on F" 
in 0Q . The corresponding quasi log-likelihood functions are denoted by ^ and respectively. 
(a) We distinguish two different cases. 

Case 1: MCARMA(A^,B^,C^,L^) / F for every r? G 0. Then 


IC„(0o) -IC„(0) = -.F(t?«,F”) + [A(0o) - A(0)]^. 

On the one hand, bv lTheorem 3. 101 we have that 

= .F(r,F") + (9a.s. 


^0 t?o",FM = j2f’o(t?o,k”) + (9a.s. 


/ log (log (n)) 

V 

and on the other hand, by Proposition l3..5l (bi 

.F(r,F”) = ,S(r)+Oa.s.(l) and .^(r9o,k") = ^(t?o)+Oa.s.(l). 


(4.2) 


Finally, in this case the inequality from eq. (lA. Ill is strict, so that for some 5 > 0 

IC„(0o) -IC„(0) = +f(«) + [A(0o) - A(0)]® 

n 

<-5+9{n) + [A(0o) - A(0)]^, 

n 


where r{n) is Oa.s.(l). By assumption it holds that C{n)/n —)• 0 as n —)■ oo, so that we get 


P limsup(IC„(0o) -IC„(0)) <-5 =1. 

\ n—>oo / 

Case 2: 0o is nested in 0 with map F. Note that 0o is also nested in 0 q by definition, which then in 
turn means that 0 is nested in 0 q , implying 

j^(^«,F") =min.F(t?,F”) > min ^(r?,F”) = .^(^^,F"). (4.3) 

i>e0 

Moreover, = '£^*,k = k and hence, 

^(t?o,F”) =.F(r,F") . (4.4) 

With this and (14.31) we receive 

,F") - ^(^”, F”) < ^ (rJo"",k") - ^(^o"">• 
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Now, Theorem l3. IG tells us that 
n 


limsup: 




log (log (?l)) 

Turning to the information criterion, this gives 


limsup 


log(log(«)) 


(IC„(0o)-IC„(0)) 


< limsup -—-—— 
n^oo log(log(n)) 


^ «,T”) + [iV(0o) -iv(0)] 


C{n) 


log(log(n)) 


< Knaxi^i'^o) 2)-limSUp 


C{n) 


log (log (n)) 


P-a.s., 


since N(0o) -N(0) < -1. Hence, if limsup„^„, iog(ilg(«)) > '^ax(^(t?(f) 5 j?(t?^)=^(t?^) 5), 
we obtain 

P (limsup-— ” (ICn(0o) -IC„(0)) <0^ = 1. 

V n^oo log(log(?l)) J 

Finally, if limsup„^^C(n)/log(log(n)) = 0, then from — .if > 0 it clearly 

follows that 

P (limsup / (IC„(0o)-IC„(0)) > 0 ) = 1, 

V n^oo log(log(?l)) J 

SO that strong consistency cannot hold. 

(b) Again we distinguish the two cases from part (a). Case 1 is dealt with analogously as in (a), so 
that we only need to give detailed arguments for case 2. Suppose therefore that 0o is nested in 0. 
Define the map / : 0o —)• 0 by /(t?) = Ft^ + c, where F and c are as in the definition of nested spaces. 
Then, a Taylor expansion of ^ if around t?” results in 






mi) 

(4.5) 


with r?" such that ||r?" — r?”|| < ||/(t5-Q) — t?”||. Plugging (14.51) into (14.21) gives 

+ [A(0o)-fV(0)]^. 

n 

(4.6) 

In order to be able to show weak consistency, we will study the behavior of the random variable 
t?" —/(t5-Q). Note that (r?,T") = Jf (/(r?),T”) for ■& G 0o, so that by the chain rule 

V^.^(r9o,P") =C^V^.F(/(t?o),T") =F^V^.F(r ,F"). 


IC„(0o)-IC„(0) = -^r-/(t?o”)j ,F”j^r-/(4" 


Moreover, 

/(4”) - t?* = /(4”) -/(r9o) = C(t4” - r?o). 
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As in (13.111) . we also have 




where t?" is such that ||t>” — t?*|| < ||t?" — t?*|| and is such that ||^" — t^oll < ||t?o “ ’'^oll- 
particular, —)• ■&* and ?>” —)■ r9o F-a.s. as n —)• oo. To summarize, 




« 






V^.F(t?*,T”). 


An application of Proposition Is.-^l tcl and (d) results in 


Since by the chain rule = F^Jif{'&*)F the random vector Nf is distributed as 

(t?*)^(t?*).y#F(t?*)) (note that is symmetric). Finally, by (14.61) . Proposi- 

tion l3..5K dl and C{n) —>• oo as n —)• oo, 


P(IC„(0o)-IC„(0)<O) 

= P Qv^ -/(t?o”))^ v 2.F(^",T") -/(t4”)) < -[A(0o) -A(0)]C(n)) 

” 4 “ P(N^,^(t?*)Nf <oo). 

Using llH Eq. (1.1)] gives N^Jf'(r?*)Nf = Y.flf'’ ^iXn where (xf) is a sequence of independent x^ 
random variables with one degree of freedom and the A, are the eigenvalues of 

Since rank(.^f (r?*)) = A(0) -A(0o) and 

and ^(ri*) have full rank, the number of strictly positive eigenvalues of 

is A(0) — A(0o). Hence, the result follows. 

(c) With the arguments in (b) we obtain the statement. □ 


Remark 4.5. 

(a) A conclusion of Theorem \^ a) is that strong consistency of the information criterion always 
holds, independent of the process Y generating the observed data and hence tTq, if 

limsup„^„C(n)/log(log(n)) =oo. 

(b) Let 00 be nested in 0 with map F. Then it can be shown as in the proof of Theorem s. IQ that 

lim sup -— {ICn (00 ) - /C„ (0 )) 

„^oo log (log (n)) 

= Am,,v(^f(t?*)^) + hmsup[A(0o)-A^(0)]^— .. . 

„^oo log(log(n)) 

This implies that the information criterion !€„ is not strongly consistent iff 
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limsup„^^C(?i)/log(log(?i)) <C*, where 


C* := max 

F 






N{e)-N{eo) 




ex- 


Since the structure of and ^{'&*) is in general not known, it is difficu lt to calculate C* 

plicitly. However, in the Gaussian case we will derive that C* =2 (cf Corollary U.d) . 

(c) Wfe would like to note that these results are similar to the statement of ^ 3 ^ Corollary 5 . 3 ] under 
different model assumptions. However, the authors present only suffi cient conditions for strong con¬ 
sistency, where we also have a n ecessary condition (see Remark 3.1 A as well). 

(d) As the proof of The orem \4.4( a). Case 1 , shows, for spaces 0 with MCARMA{A^,B^,C^,L^) Y 
for every if G&a necessary and sufficient condition for choosing the correct parameter space asymp¬ 
totically with probability 1 is lim„_^ooC(n)/n = 0. Only if we allow nested models as well the ad¬ 
ditional condition limsup„_^^C(?i)/log(log(n)) > C* becomes necessary. The probability in Theo- 
rem \4.4( c) is the overfitting probability. 


To wrap up this section, we want to study the special case where the observed MCARMA process 
is driven by a Brownian motion. Some of the technical auxiliary results for the proof are given in the 
appendix. 

Corollary 4.6. Assume that the Levy process L which drives the observed process Y is a Brownian 
motion. Then ICn is strongly consistent ;j^limsup„^„C(n)/log(log(n)) > 2. 

Proof. From Lemma lA.SI tbl we know that there exists a space 0o such that there is a r9o G 0o with 
MCARMA(A^,B^,C^,L^) = Y and 0o is nested in 0 q with map F. Moreover, A(0o) = A(0f) — 
1 and 

Additionally, a conclusion of Lemma lA^ al is that 

Therefore the statement follows directly from Theorem Id. 4l ( a) and Remark Ir^ bl. □ 


The results of this section are analogous to the ones obtained for ARMAX processes with i.i.d. noise 
in lfT9l Theorem 5.5.1]. 


5 AlC and BIC 

In this chapter, we transfer the two most well-known information criteria, the AIC and BIC, to the 
MCARMA framework, highlight the main ideas in their development and apply the results of Sec- 
tion0to them. 

5.1 The Akaike Information Criterion (AIC) 

Historically, Akaike’s idea was to study the Kullback-Leibler discrepancy of different models and 
choose the one which minimizes this quantity. In this section, we give arguments why this approach 
is also sensible in the case of MCARMA models. 
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As a starting point, let g,f be probability densities on M”. Then the Kullback-Leibler discrepancy 
between g and / is 

Equality holds only for g = / (cf. ISJ p. 302]). Let now be a family of densities on M" and 

fix one “true” density . With we denote the expectation regarding the distribution with density 
/^g. Then, the density that comes closest to in the Kullback-Leibler sense is given by the one 
associated to 


argminK(/^|/^g) = argmin{E^[log(/^)] -E^[log(/^)]} = argmin 
)>€© )>€© )>€© 


^E^o[log(/i?)]|- 


In our context denotes the density of the observations F”. The problem is that the right-hand side is 
not directly calculable so that we have to approximate it. To this end, let ‘3^" be an independent copy 
of F" and t?”(F”) be the QMLE in 0 based on the observation F”. Then we use the approximation 


min 

)>€© 


-^Ei^[log (/^)] 


--E^[log(/g„(y„P I F"] 


E 


.if(t?"(F"),^”) I F" 


-?E[iog(/j.,,.,(sr")) I S'"] 


(5.1) 


The right-hand side can again be approximated by the following theorem: 

Theorem 5.1. Assume that the space 0 with associated family of continuous-time state space models 
satisfies Assumption B Then, as n^ oo, 






• Eij*, 


where Z^* is a random variable with expectation E[Z^*] = 0. In particular, as n 


.if(t?”(F”),^”)- 




0 . 


Proof A second-order Taylor expansion of .if (t?”(^^"),F") around B"(V") gives 


where ||t? -t?"(F")|| < - t?"(F”)||. Hence, 


^(t?"(^"),F")-.i^(7?”(F"),F") 


= -tr ( ,Y'^) {- B%Y")) ( - t?"(F^ 


On the one hand, since both 7?”(F”) and converge P-a.s. to B*, the vector B" —)• B* P-a.s. as 

well. On the other hand, by the independence of F” and 3^", the random vectors and B"{Y") 
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are independent as well. By Theorem is.d as n —)• oo, 




where are independent, ^( 0 ^( 0 ), JF’ '(r?*)'(r?*))-distributed random vectors. 

A conclusion of Proposition l3..5l (dl is Jfl'i}", ¥")—>■ P-a.s. Hence, a continuous mapping 

theorem gives 

and by the independence of and ^ we have 

=2^(r)E [jy'ijy'Y] =2^(r)jr-'(r). 

The statement follows then obviously since the expectation of the trace is the trace of the expecta¬ 
tion. □ 


As a consequence of (15.11) and Theorem l5.ll we receive the approximation 


mm 

i?e0 


--E^[log (/^)] 






which becomes our information criterion via the following definition: 

Definition 5.2. For a space 0 with associated family of continuous-time state space models 
that satisfies Assumption B the Akaike Information Criterion (AIC) is defined 
as 


AICn{&) = + 


tr{^{B*)d^-YB*)) 


In general ^{B*) and are not known. For practical purposes, they have to be estimated. For 

both, estimators are known and can be found at the end of ESl Section 2 . 2 ], for example. 


Remark 5.3. If the Levy process L which drives the observed process Y is a Br ownian motion and 
MCARMA(A^*,B^*,C^*,L^*) = Y, we have by Lemma and hence, the AIC 

reduces to 


A/C„(0) =.if(t?",F") + 


2A(0) 


The form of the AIC given in this remark coincides with Akaike’s original definition (cf. H]). For 
these reasons, it suggests itself to define an alfernafive version of fhe AIC as follows: 

Definition 5.4. For a space 0 with associated family of continuous-time state space models 
that satisfies Assumption B the Classical Akaike Information Criterion (CAIC) 

is defined as 


CA/C„(0) =J^(^”,F") + 


2A(0) 

n 
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This criterion avoids the additional work of estimating the matrices and appearing 

in the AIC, which comes at the cost of not being exact when the driving Levy process is not a Brownian 
motion. For both versions of the AIC, we can immediately make a statement about consistency: 


Theorem 5.5. Both the AIC and the CAIC are neither strongly nor weakly consistent. 


Pro of. T he CAIC is a special case of !€„ with CYn) = 2 such that the assertion follows from Theo¬ 


rem 


d-dltb). For the AIC, the proof of Theorem 14.41 (b) can be directly adapted. 


□ 


5.2 The Bayesian Information Criterion (BIC) 

Another information criterion which appears often in the literature is the so-called Bayesian Informa¬ 
tion Criterion (BIC), sometimes also called SIC, an abbreviation for Schwarz Information Criterion, 
named after the author who originally introduced it in Il29l . Another often-cited article in this context 
is ll25l . which introduces an equivalent criterion in a slightly different context based on coding theory. 
As the name Bayesian Information Criterion already suggests, the approach of the definition is based 
on Bayesian statistics. Our derivation is based on |[T2]| . relying on properties of the likelihood func¬ 
tion. Suppose that ;r is a discrete prior probability distribution over the set of candidate spaces 0 and 
7r(0) > 0 for every parameter space 0 which will be considered. Moreover, suppose that g{- | 0) is a 
prior probability distribution over the parameter space 0. For g we require the following assumption. 


Assumption C. For every space 0 there exist two constants b and B with 0 < b < B < oo such that 
0 < g{B I 0 ) < Bfor all B ^ & and b < g(B | 0 ) for all B in some neighborhood of the pseudo-true 
parameter rJ* £ 0 . 


Now we can apply Bayes’ theorem to obtain the joint posterior probability distribution / of 0 and B 
which is 




n{@)g{'& I 0)/(T" I 0,r?) 
/i(T") 


(5.2) 


where hf) denotes the (unknown) marginal density of T”. With this, we can calculate the a posteriori 
probability of space 0 as 

P(0 I F”) = / /(0, B I Y")dB. (5.3) 

J& 


The idea is to choose the most probable model for the data at hand, i. e. the space 0 which maximizes 
the a posteriori probability. Similar to the derivation of the AIC, the task is now to find a good approx¬ 
imation of (15.31) which is directly calculable from the data. For this note first that maximization of 
(15.31) is equivalent to minimizing —2/n times the logarithm of P(0 | T”). Applying this transformation 
and plugging in (15.21) gives 


--log(P(0 I Y'^)) = -log(/i(T”))--log(7r(0 ))--log( //(T" I e,B)g(B I 0)dt? ) . (5.4) 

n n n n \Jq / 


We choose the parameter space 0 with the lowest value of — ^log(P(0 | T”)). Hence, we have to 
approximate this expression. For this, we approximate the unknown density /(T" | 0, B) by the 
pseudo-Gaussian likelihood function = exp(—(r?,T”)) and use the following theorem. 
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Theorem 5.6. Assume that the space 0 with associated family of continuous-time state space models 
satisfies\Assumption B\and the a priori density g satisfies Assumption C Then 


n n n yj© 

< +A^(0)i^^ + 

n n 

where R\{N{@)) and R 2 {N{&)) are rest terms which do not depend on n. In particular, 


--log(P(0 I F”)) = .F(^",r)+^(0)i^^ + 
n n 


■\og{h{Y^)) + 0 


iog(?i; 


Proof By Assumption B[ Assumption C[ Proposition l3.5l and |[^ Proposition 3.1] the regularity 
assumptions in lIT^ are satisfied so that the statement follows from there. □ 

The term ^log(/i(F")) is the same across all parameter spaces and therefore not relevant for model 
selection. Based on these ideas, we define the BIC. 


Definition 5.7. Assume that the space 0 with associated family of continuous-time state space models 
satisfies Assumption B Then the Bayesian Information Criterion (BIC) is defined 

B/C„(0) :=.F(^",F”)+A(0)i^^^. 


as 


As with the AlC, we can immediately make a statement about consistency of the BIC: 

Theorem 5.8. The BIC is a strongly consistent information criterion. 

Proof T he B IC is a special case of IC„ with C{n) = log(n). The asserti on im mediately follows from 
Theorem IdAltal. since lim„^.c„log(n)/log(log(u)) = oo (see also RemarklA^all. □ 


6 Simulation study 

The results on information criteria obtained in the previous sections will now be illustrated by a 
simulation study. In this context we would like to thank Eckhard Schlemm and Robert Stelzer who 
kindly provided the MATLAB code for the simulation and parameter estimation of the MCARMA 
process. As before, we use the Echelon MCARMA parametrization in the simulations. We simulate 
a two-dimensional MCARMA process with Kronecker index m^ = (1,2) for two parameter values. 
One is an MCARMA(2,0) process with parameter 

t?^^^ = (-l -2 1 -2 -3 0 0). 

The other is an MCARMA(2,1) process with parameter 

r?^^^ = (-l -2 1-2-312). 

As driving Eevy process, we use, on the one hand, a two-dimensional, correlated Brownian motion 
and, on the other hand, a two-dimensional, normal-inverse Gaussian (NIG) process. Eor the NIG 
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process the increments L{t) —L{l — 1) have the density 


fNiGix',IJ-,cc,p,5,A) = 


^{px) ^ _j_ (xg(^x) 

2n e“«W g(v)3 ' 


X £ . 


where 


g{x) = ^5^ + {x-^,A{x-n)), K^ = a^-{I5,AI5). 


The parameter /i £ is a location parameter, a > 0 is a shape parameter, j8 £ is a symmetry 
parameter, 5 > 0 is a scale parameter and A £ is a positive semidefinite matrix with det(A) = 1 
that determines the dependence between the components of the Levy process. In the simulations we 
use the values 


5 = 1, a = 3 , j8 = 


A= ^ 


M = - 


1 




2V^ V) ’ 


which result in a zero-mean process with covariance matrix 


yt- 

^NIG 


0.4571 -0.1622\ 

-0.1622 0.3708 ) 


In the case of the Brownian motion the covariance matrix is equal to the covariance matrix 
in the NIG case. In the estimation the number of free parameters includes three parameters for the 
covariance matrix of the driving Levy process. 

The simulation of the continuous-time process is done with the initial value X (0) = 0, applying the 
Euler-Maruyama method to the stochastic differential equation (12.21) and then evoking (12.11) . For the 
Euler-Maruyama scheme we operate on the interval [0,2000] and take the step size 0.01. Afterwards, 
the simulated process is sampled at discrete points in time with sampling distance /i = 1, resulting in 
n = 2000 observations. After obtaining the discr ete s amples of the M CARMA pro cess we calculate 
the AIC, CAIC and BIG as defined in Definition Definition Is.dl and Definition respectively. 
In the calculation of the AIC we estimate the penalty term tr by the methods 

presented in Il28l Section 2.2] as well since in general there is no explicit form of ^{•&*) and 
We consider eight different parameter spaces in total. While some of them differ in the Kronecker 
index, others differ only by the degree of the MA polynomial of the MCARMA process. We compare 
the different values of the information criteria and write down the srace for which the minimum values 
is attained. The results of 50 replications are summarized in Table [ll 

As expected because of the strong consistency the BIG performs convincingly and has a high accu¬ 
racy in both cases. It even achieves a perfect score in the case where the driving noise is a NIG process 
and makes one wrong decision in the BM scenario. Furthermore, both versions of the AIC exhibit 
overfitting. There is an undeniable difference between the CAIC and the AIC in both cases. From 
the theory, we know that this should not happen when the driving Fevy process is a Brownian motion 
since the criteria are then the same. This difference comes from the estimation error by estimating the 
penalty term tv {-&*)) in the AIC. We realize that in the Gaussian model the estimation 

error of the penalty term is usually higher for model number 3 than for model 2 (relative to the true 
values), which results in a higher overfitting rate for the AIC . We also calculate the overfitting prob¬ 
ability in the Brownian motion case as given in Theorem k.di tcl. For this, note that there is only one 
parameter space in which the true one is nested (space number 2) and for that space we have C = 2, 
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Space 

Model 

BM 

NIG 


m 

P 


N{&) 

AlC 

CAIC 

BIC 

AlC 

CAIC 

BIC 

1 

(1,1) 

1 

0 

1 

0 

0 

0 

0 

0 

0 

2 

(1,2) 

2 

1 

10 

14 

8 

1 

10 

4 

0 

3 

(1,2) 

2 

0 

8 

36 

42 

49 

40 

46 

50 

4 

(2,1) 

2 

1 

11 

0 

0 

0 

0 

0 

0 

5 

(2,1) 

2 

0 

9 

0 

0 

0 

0 

0 

0 

6 

(2,2) 

2 

1 

15 

0 

0 

0 

0 

0 

0 

7 

(2,2) 

2 

0 

11 

0 

0 

0 

0 

0 

0 

8 

(3,2) 

3 

2 

19 

0 

0 

0 

0 

0 

0 


Table 1: Results for the true parameter and 


N{&) —N{&o) = 2 and the entries of F are given by 


Fij = 



if i = j and i G {1,2,3,4,5,8,9,10}, 
otherwise. 


The strictly positive eigenvalues of are calculated with 

the help of MATLAB and turn out to be both equal to 2, so that the overfitting probability simplifies 
fo 

2) « 0.1573. 

The empirical probabilify 8/50 = 0.16 of overfilling in fhe CAIC is very close. The resulfs of fhe 
simulalion sludy for are given in Table 0. 


Space 

Model 

BM 

NIG 


m 

P 

<1 

A(0) 

AlC 

CAIC 

BIC 

AlC 

CAIC 

BIC 

1 

(1,1) 

1 

0 

7 

0 

0 

0 

0 

0 

0 

2 

(1,2) 

2 

1 

10 

50 

50 

50 

50 

50 

50 

3 

(1,2) 

2 

0 

8 

0 

0 

0 

0 

0 

0 

4 

(2,1) 

2 

1 

11 

0 

0 

0 

0 

0 

0 

5 

(2,1) 

2 

0 

9 

0 

0 

0 

0 

0 

0 

6 

(2,2) 

2 

1 

15 

0 

0 

0 

0 

0 

0 

7 

(2,2) 

2 

0 

11 

0 

0 

0 

0 

0 

0 

8 

(3,2) 

3 

2 

19 

0 

0 

0 

0 

0 

0 


Table 2: Resulfs for fhe frue parameter and 


As we can see all fhe informalion criteria perform perfeclly. There are no effecls of overfilling, 
which is nol surprising considering fhe facl lhal fhe Irue parameter is chosen in such a way lhal i f is n ol 
contained in any of the other spaces besides space number 2, so that the scenario from Remark lr^ c) 
is given. 
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A Appendix 

A.1 Auxiliary results for Section 

We summarize some auxiliary results which are used throughout the paper. We start with a lemma 
giving moving average representations of the pseudo-innovations and their derivatives. 

Lemma A.1. Assume that the space 0 with associated family of continuous-time state space models 
satisfies As sumptions \B .l\ to \B .9\ 

(a) There exists a matrix sequence such that 


£^,k = Y{kh)-\-'^c^^vY{{k—v)h), k^Z. 
v=i 

Furthermore, there exists a positive constant C and a constant p G (0,1) such that 

sup II < Cp^, ^ G N. 

i>e0 

(b) For each / G {1, ■ • ■ ,A(0)}, there exists a matrix sequence {c^^j^)keN such that 

oo 

die^}M='£4lyYiik-v)h), keZ. 

v=l 

Furthermore, there exists a positive constant C and a constant p G (0,1) such that 

sup ||c[^\|| < Cp^, ^ G N. 

i>e0 

(c) For each i,j G {1, ■ • ■ there exists a matrix sequence such that 

oo 

= £ 4yY{{k - V)h), k G Z. 

V=1 

Furthermore, there exists a positive constant C and a constant p G (0,1) such that 

sup ||c^’-i^ II < Cp^, ^ G N. 

1?60 

Proof Part (a) is |[^ Lemma 2.6ii)], part (b) is |[2^ Lemma 2.1 lii)] and part (c) is |[^ Lemma 
2.1 liv)] where we additionally use ESl Lemma 3.14]. □ 

In the next step, we show that it does not matter whether we consider the approximate pseudo¬ 
innovations or the pseudo-innovations. 

Lemma A.2. Assume that the space 0 with associated family of continuous-time state space models 
satisfies AssumDtions \B .l\ to \B .9\ IfforiJ G {1,... ,A(0)} the initial values 
are such that sup^gQ ||A^,i ||, sup^g 0 \\diX^y || and sup^g 0 \\dfjX^y || are almost surely finite, then it 
holds: 
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(a) sup^gQ 

(b) 

(c) sup^gQ {t},Y'') - dlj^ {t},Y") 

(d) sup^g0E[^(T?,F”)-Jf(T?,n 


0 as oo P-a.5. 

> 0 as n ^ oo. 

0 as oo P-a.i'. 


0 as n 


Proof, (a) is 11281 Lemma 2.7] taking 11281 Lemma 3.14] into account. The proof of (b) and (c) follows 
in the same way by using ESl Lemma 2 . 11 ]. 

(d) As in the proof of |[28l Lemma 2.7], we have sup^g 0 E ||£^ ,t|| < oo, sup^g 0 E ||e^ ,t|| < oo and for 
some p G (0,1) the behavior 


supE 

)>G 0 


.F(t?,F”)-^(t?,F”) 1 <-£p'=sup(E||£^,,||+E||£^,,| 

^ k=\ '?G 0 


0 . 


□ 


We conclude this section with another lemma, which plays a role in the proof of consistency of 
information criteria for MCARMA processes. 

Lemma A.3. Assume that the space 0 with associated family of continuous-time state space models 
satisfies Assumption B Let MCARMA{A^^,B^,C^,L.^) = F. Then for every 

t?G0 


^(t?)-^(t9o) >tr(^l/^'E (£^,1-e^,i)(ei),i) 

Furthermore, if MCARMA{A^,B^,C^,L^) f^Y, then 


> 0 . 


tr 


(^L^^E (£^),i-£^,i)(£n,i-£^,i)^ ) 


> 0 . 


Proof The proof is given in 11281 Lemma 2.10]. 


(A.l) 

□ 


A.2 Auxiliary results for Section H 

In this appendix, we give the calculations for the Brownian motion case in Section 0. 

Lemma A.4. Let A,B G be matrices, where B is symmetric. Then 

tr ((vec(/^xrf)< 8 ) vec(/rfxd)^)(A( 8 )B)) = tr(AS). 

Proof. The proof can be derived by straightforward algebraic calculations. □ 


Lemma A.5, Assume that the Levy process L which drives the observed process Y is a Brownian 
motion. 


(a) Assume that the space 0 with associated family of continuous-time state space models 
{A^,B^,C^,L^)^^q satisfies Assumption B and that MCARMA{A^*,B^*,C^*,L^*) = Y for the 
pseudo-true parameter B*. Then 
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(b) There exists a space 0o with associated family of continuous-time state space models 

satisfying Assumption B such that MCARMA{A^,B^q,C^q,L^) = Y for 
some Bo € ©o- Moreover, ©o is nested in ©q with map F, N{@o) = A^(©q ) — 1 and 




Proof (a) An analogous statement for vector ARMA processes is given in |0 Remark 2]. However, 
they state it without a proof. Since the proof is not so obvious we decided to sketch it here for 
MCARMA processes. First, note that since the driving Levy process is a Brownian motion, it holds 
per construction that the line ar innovations {£k)ke7. of tho process {Y{kh))k£z are i.i.d. M^{0,V)- 
distributed (cf. Definition [ sa]). Moreover, per assumption it also holds that e^*,k = £k for every k S Z, 
hence we also have that * ~ and V^* = V. By definition 


JA{B*) = limnVar(V^Jf(r?*,F")), 

H—>00 

which means that for G {1,... ,A(©)} we have to study terms of the form 

VaT{nVi^^{B*,Y’^)),j^'£E[{tr{V^}diVi^*)-tr{V^}e^,,k£lykV^*diVi^*) +^diel,kV&}e^.,k) 

k=i 

■ (tr ti (y^* + 2.dj£^^ y 

n n 

+ L L® +2di£i^^y^}£ik*,k) 

k=\i=i 

t^k 

• (tr -tr{vy£i,*^i£l.^yydjVik*) +2dj£l^ iVy £^,*^ 1 )] 

n n n 

=■ 

k=i k=\i=\ 
lAk 

We start to investigate ak. B y definition, every innovation £^*,k is orthogonal to span{F (jh) : —00 < 
j < k} and by Lemma IA. ll tb) both di£^*^k and dj£^*^k are elements of span{F {jh) : -00 < ;• < k}. 
Hence, £^*^k is independent of di£^*^k and dj£^*^k- This, together with the independence of the inno¬ 
vation sequence {£^*yken^ tho fact that E[5,£^*^;(.] = 0, E[£^*,,t£^* = V^* and the interchangeability 

of trace and expectation, allows us to simplify 

at = -ti{V^}diV^*) tr +E [tr {V^}£^.^k4*,kV^}diV^^) tr {V^}£^.^ksl*,kV^*'djV^*)] 

+ 4E [di£l, y^}£i}*^kdj£l*^y^} £^*,i] 

(A.2) 


=: 4'^ +«f^- 


For the second term, we define £^\k = V^f£^t^k ~ ^{Ofdxd) and have by sfandard calculafion rules 
for Kronecker producfs (141 Proposifion 7.1.6 and Proposition 7.1.12]): 


aP = E 


tr ( ( ^,k^-(k* jiYdjV-Q* j ® I ^jiYdjV-Q 


= tr ( ( ® F^," ) • E [£rk>y£l>^k ® djVik 
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Since £^*^k ~ ^{0,ldxd), by means of IS Theorem 1] the expectation appearing in the last line is 


® T j2 + Vec(/^xrf) ® ^£c(/|^X(i) j 

where Kd^ is the Kr'onecker permutation matrix (||4j Eq. (7.1.20)]). Together with the linearity 

and the cyclic permutation property of the trace, we use this to obtain 

af'> = tr {Kdj ® diV^> ® djV^> ^ ^ 

+ tr ® j ^ (^E^ j diV^* ® vj djVi^. ^ ^ 

+ tr(^{vec{ldxd)^vecildxdf) ® E^75yE^*^^ 

= tr {Kd,d {V^.^diV^.^V^!djV^.)) +tr (E^-/5,E^* ® E^T'^yE^*) 

+ tr ((vec(/rfxrf) «) yec{IdxdV){Vi}}diVi}* (g) E^.'^yE^*)) • 

We now apply Lemma |a 3 as well as in Fact 7.4.30 xviii) and Proposition 7.1.12] to get 

af = 2 tr (E^-.i^^E^.E^-.i^yE^*) +tr (E^t' 5,'E^*) tr (E^T^^yE^*). 

It remains to consider in (IA.2I) . The independence of dj£^*^kdi£^* ^ and the cyclic permuta¬ 
tion property of the trace and the interchangeability of expectation and trace leads to 

^ = E l^tr (E^* y^-Ejj* 5y£^*y;5;£^* j 

= tr (E E [dj£i}*^kdi£l,^k ]) 

= tr (E^.‘E [dj£i}*^kdi£l>^k ]) 

— E ^di£^tj^V^t 5y£ij*y;j . 


Combining those calculations finally results in 

^ + a[ ^ + a[ ^ = 2tr (E^*'5,-E^*E^,'5yE^*) +4E [5,-£^* j(,E^»^5y£^*^^] . 

By similar calculations, we can verify that bkj = 0 for k / Z. 

Finally, this implies 

(J^(t7*));y = ak = 2tr {V^}diV^.V^}djV^.) +4E [5,£j.^^E^T'5y£^*,^] . 

By l28l (2.33a) and (2.33b)], this term is equal to (2J^(r7*)),y as proclaimed. 

(b) Denote by vi,... ,Vjy^qe^ the eigenvectors of which are an orthonormal basis of 

Define F = (vi,... ,v^( 0 £)_j) G E^(®o)^ W®o)^i) and lef 0o C be compact such that F@o + 

(t?^ - FF^^^) C and F^i}^ G 0o. Define 


{A^,B^,C^,L^)^(zQg :— {Ap^^|^^E_ppT^E-J,Bp^_^_(^^E_ppT^E•^,Cp^^^E_ppT^E^,Lp^^|^^B_ppT^Es^)^e&0■ 

Then Bo = F^Bq,&o is nested in 0 q with map F and satisfies Assumption B[ and A(0o) = A^(0f) — 

1. Moreover, fhe eigenvecfors vi,..., are basis vectors of fhe image of F and v^( 0 £) is a basis 
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of the kernel of . Then %( 0 £) is an eigenvector of (r9-,f ) 2 ^(r 9 -,f )2 for the eigenvalue 

2 and vi, ... ,V;v( 0 £)_i are eigenvectors of for the eigenvalue 0 as welLD 
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